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INTRODUCTION 


One  of  the  most  significant  changes  in  theoretical  computer  science 
has  been  the  recent  infusion  of  the  methods  and  problems  from  combinatorial 
analysis.  Among  the  most  powerful  combinatorial  theorems  which  have  been 
imported  to  computer  science  are  those  of  extremal  graph  theory  [1]:  in 
extremal  graph  theory,  one  is  interested  in  the  largest  (or  in  complementary 
problems,  the  smallest)  graph  which  avoids  (or  contains)  a given  structure. 
Purely  combinatorial  results  (which  have  significance,  e.g.,  for  the  design 
of  circuit  boards)  have  been  obtained  by  Chung  and  Graham  [2]  and  by  Chung, 
Graham,  and  Pippenger  [3].  In  this  paper,  we  extend  this  theory  to 
encompass  results  concerning  data  structures. 

As  motivation  for  the  results  to  the  described,  note  that  many  of  the 
large  data  structures  manipulated  by  the  programs  described  in  [4,5]  have 
two  characteristics 

(i)  they  are  sequentially  accessed,  and 

(ii)  many  distinct  structures  convolve  in  the 
same  physical  memory. 

For  applications  of  this  sort,  it  would  obviously  be  desirable  to  have 
available  a universal  data  structure  in  which  all  data  structures  from  a 
given  class  may  gracefully  reside.  In  view  of  (i),  by  "graceful"  we  mean 
that  the  sequential  accessing  characteristics  of  the  embedded  data  structures 
are  not  too  drastically  altered.  Let  us  measure  such  alterations  by  the 
dilation  of  logical  adjacencies  [6,7]  needed  to  embed  all  structures  from  a 
given  class  into  a universal  structure;  this  is  then  a complementary 
extremal  graph  theory  problem:  what  is  the  size  (number  of  edges)  of  the 
smallest  universal  graph  for  a given  dilation  factor^. 


The  main  results  contained  In  this  paper  address  such  problems  from 
a number  of  points  of  view. 

(1)  We  give  several  asymptotically  optimal  universal  data 
structures  for  graphs  of  n vertices  when  average  dilation 
[7]  is  used  as  a measure. 

(2)  We  discuss  a universal  data  structure  for  graphs  of  n 
vertices  where  worst-case  dilation  is  used  as  a measure  [6], 

(3)  We  consider  variations  of  the  average  dilation  measure 
which  gives  favorable  comparisons  between  data  structures 
studied  in  [6,7]. 

(4)  We  consider  the  kinds  of  "sharing"  that  can  take  place 
between  "almost  linear"  and  "almost  complete  tree-like" 
structures. 

(5)  Finally,  we  propose  a data  structure  embedding  model  which 
recovers  some  aspects  of  random  accessing  of  data  items, 
and  prove  a space-time  tradeoff  which  seems  to  indicate 
that  no  savings  is  possible  in  BAM  models  which  assess 
accessings  costs  uniformly  [8], 


PRELIMINARIES 

A graph,  G,  is  defined  by  its  vertices , V(G),  and  edges , 

E(G)  £ V(G)  x V(G).  Edges  are  assumed  to  be  undirected : a pair  of  vertices 

x,y  are  connected  if  either  (x,y)  e E(G)  or  (y,x)  e E(G) . A path  between 

xn,  x is  said  to  be  of  length  n.  The  distance  metric  d_(xrt,  x ) is  defined 
U n u u n 

to  be  n if  there  is  no  shorter  path  than  Xg,...,xn. 

A graph  represents  a data  structure  in  the  obvious  way:  vertices 
represent  nodes  or  records  and  connectedness  models  logical  adjacency. 

The  following  relations  and  their  significance  for  data  structures  can  be 
found  in  [6,7].  Let  G,  G*  be  graphs.  We  say  that  G is  T -worst  case 
embeddable  in  G*  (Gf^G*)  if  there  is  a one-one  ♦ :V(G)+V(G*)  such  that  (x,y) 

C E(G)  Implies 

d^Wx),  *(y))  < T. 

2 


(1) 


Similarly,  G is  A- average  case  embeddable  in  G*  (G  <_  a^8  G*)  if  there 


is  a one-one  $ as  above  such  that 

d^Mx),  4>(y))  < A- 1 B(G)  | 


x.y 

connected 


(2) 


In  [4,5],  comparisons  between  several  natural  classes  of  graphs  give 
asymptotic  bounds  on  T,  A in  (1),  (2)  as  functions  of  |v(G)|.  Shortly 
after  the  announcement  of  the  results  of  [6],  R.  M.  Karp  suggested  to  us 
the  following  class  of  problems  connected  with  extremal  graph  theory: 
what  are  the  characteristics  of  - universal  data  structures;  i.e., 
those  structures  which  T-worst  case  embed  all  graphs  in  a given  class. 
This  paper  grew  out  of  considering  these  problems. 


UNIVERSAL  GRAPHS 

Let  £n  be  a given  class  of  graphs  G,  |V(G) | 3 n.  Let  us  ask  about  a 
data  structure  which  is  or  £ A universal  for  C . In  particular,  let 
us  define 

w(Cn,  T)  = min  (|E(G)|:  Gn  e ?n,  <T  G}  (3) 

and 

a(Cn,  A)  - min  {|E(G)|:Gn  e t? , G°  < a][8  G}. 

For  T * 1,  (3)  becomes  the  complementary  extremal  graph  problem  studied 
in  [2,3]. 

By  an  n-tree  G,  we  mean  a connected  acyclic  graph  G,  with  |v(G)|  “ n 
It  is  also  convenient  to  think  of  trees  as  rooted  in  the  following  sense: 
accompanying  G,  there  is  an  ancestor-descendent  relation  that  assigns 
direct  ancestors  and  direct  descendants  to  vertices  in  the  obvious  way  so 
that  a vertex  with  no  ancestors  can  be  designated  as  the  root  of  the  tree 


(Obviously  this  choice  is  not  going  to  be  unique,  but  we  assume  that  G 

is  not  characterized  until  such  a choice  is  made) . A d-ary  n-tree  is  an 

n-tree  in  which  each  vertex  has  at  most  d direct  descendents.  We  denote, 

respectively,  the  classes  of  n-trees  and  d-ary  n-trees  by  Fn  and  rn*  . 

a 

By  [2]  it  is  known  that  %n  log  n < w(r°,  1)  < n^^^n\  k(n)  = 

[log  log  n]-1.  + 

The  upper  bound  was  improved  in  [3]  to 

w(rn,  1)  = 0(n  log  n[log  log  n]^) 

The  bounds  on  a(fn,  1)  are  apparently  not  elsewhere  considered. 

Superficially,  at  least,  all  interest  in  further  characterization  of 
(3)  is  destroyed  by  the  following  obvious 
Theorem.  For  T 2 

w(rn,  T)  * n 

Of  course,  in  (3),  the  "target"  graph  G may  have  unbounded  degree. 
Therefore,  it  is  natural  to  consider  w(£n,  T,  S)  and  a(£n,  T,  S)  where  in 
both  cases  the  target  graph  G is  restricted  to  be  in'  the  set  S.  Note  that 
now  the  theorem  just  cited  is  no  longer  obviously  true. 


* Thus  r°  “ binary  trees  on  n vertices, 
t In  the  sequel,  we  use  log  x for  log^x  and  Si nx  for  logex. 
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The  best  that  is  known  is  the  upper  bound  of  [3]  (S  = all  cubic 


graphs) 

n o o 

w(rn,  1,  S)  < ^~exp  (log  n/2  log  2)  . (4) 

It  is  not  obvious  that  when  (i)  "targets”  are  restricted  to  binary  trees 

and  (ii)  wCT^,  T,  is  considered,  that  it  is  possible  to  do  any  better 

than  the  union  of  all  trees  in  giving  a structure  of  size  4n/ £ . 

But,  we  have  the  following 

Theorem.  For  each  T 1,  there  is  a binary  tree  H,  such  that  G >_  H 
for  all  G e and 

£n|E(G)  | < ^ ; 

or  in  other  words 

w(r”,  T,  r“)  = exp  (£n  n)2  + 0((Jln  n)2)  . 

A key  step  in  the  proof  of  this  theorem  hinges  on  the  solution  to 
the  fascinating  "almost  linear"  recurrence 

un  • Vl  + ur«|  • <5) 

first  considered  by  Knuth  [9].  This  also  establishes  a connection 

between  the  theorem  and  lneq.  (4):  u is  also  the  number  of  partitions 

n 

of  2n  of  the  form  2*,  * 0,  1.  Knuth  [9]  bounds  the  partition 

function 

1 2 

P(m)  ■ - ■ ■ exp  (it  -r  m)  . 

4 /"3m 


f 

I 

> 

There  are  two  possibilities  for  improving  the  bounds  in  wCI1”,  T,  . 

The  first  possibility  is  to  introduce  circuits  to  the  target  graph  of  the 
previous  theorem,  but  this  does  not  appear  to  give  an  asymptotically 
better  bound  than  (4) . The  second  possibility  is  to  prove  that  balanced 
trees  and  unbalanced  trees  are  <_  - equivalent.  This  seems  unlikely  since 

combining  such  a result  with  the  proof  method  of  the  previous  theorem 

gives  a polynomial  sized  universal  tree.  However,  in  trying  to  improve  > 

the  bounds  on  wCT^,  T,  r”)  it  may  be  desirable  to  ignore  irregular  trees, 
letting  only  very  balanced  or  very  unbalanced  trees  reside  in  the  same 

universal  data  structure.  \ 

In  any  case,  it  seems  unlikely  that  polynomial  structures  are  possible. 

We  are,  however,  far  from  proving  this;  indeed,  the  best  known  lower  bound 
is  the  following 
Theorem.  For  all  n > N 

wcr”,  T,  r")  > c(T)  n log  n , 
where  c(T)  > 0 is  a constant  for  fixed  T j>  1. 

Certain  other  subcases  are  also  of  interest.  Erdos,  Chung,  and  Graham^, 
consider  w(S,l)  and  obtain 

w(S,l)  < n2  . 

The  following  theorem  is  an  improvement,  but  is  si  rely  not  the  best 
possible  bound. 

Theorem 

w(S,l)  < | n2 


t Private  Communication. 
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A non-trivial  lower  bound  would  clearly  be  desirable.  Another  class  of 
Interest  are  graphs  of  high  genus.  We  conjecture  that  for  graphs  of 
fixed  genus  y»  it  is  possible  to  do  better  than  the  naive  (a)  bound 
obtained  by  embedding  in  the  complete  graph. 

Our  next  series  of  results  show  impressive  improvements  by  passing 
to  average  dilations.  We  now  get  optimal  constructions,  even  in  a variety 
of  limited  settings. 

We  have,  for  instance,  the 
Theorem.  For  a > 0, 


a(r2’  a*  S)  = 0(nlo8(2+a) ) . 

Since  there  is  a linear  lower  bound  on  a(*,*,*)»  this  construction  is 
optimal.  By  a slight  modification  of  the  construction,  this  gives 
aCr^,  A,  S)  = 0(n),  for  all  A ^ 1,  but  this  result  may  be  superceeded  by 
the  following 

Theorem.  For  each  A >_  1,  there  is  a binary  tree  H,  such  that 

g < ayg  H 

for  all  G e ra,  and 


|E(G)|  = 0(n)  ; 


or,  in  other  words 


a(ra.  A,  rj)  = 0(n)  . 


ft  A graph  is  of  genus  y if  it  can  be  embedded  in  a sphere  with  Y 
handles  [10]. 
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These  results  are  related  to  the  ability  to  "cut"  graphs  in 
advantageous  ways.  For  example,  a generalization  of  the  planar  separator 

> 

theorem  [11]  to  graphs  of  high  genus,  obtained  by  Lip ton  and  Tar j an,  gives 
us  the  following 

Theorem.  Let  L^  be  the  class  of  graphs  G with  genus  y and  |V(G)|  = n. 

Then,  for  all  n > N, 

a(L^,  A,  r“)  < c(A)-n  , 
where  c(A)  does  not  depend  on  n. 

EXTENDED  M3DEL  . \ 

In  comparing  classes  of  data  structures  (see,  e.g.,  [6,7],  the  measures 
of  "efficiency"  have  implicitly  assumed  that  only  sequential  accessing  is 
important.  Thus,  when  in  [6],  we  bound  the  efficiency , T,  of  an  embedding 
of  n x n array  into  binary  trees  by 

T 2l  c log  n , 

the  function  T(n)  captures  the  dilation  factor  in  an  embedding.  We  now 
describe  a generalization  of  this  concept  which  recovers  a certain  kind  of 
random  accessing.  Since  the  precise  definitions  are  quite  complex,  we  will 
settle  for  a less  exact  — but  more  picturesque  — rendering.  Let  us  assume 
that  we  have  in  front  of  us  an  illustration  of  a graph  G,  and  also  a number 
of  friends  who  agree  to  lend  us  their  forefingers  for  use  in  tracing  the 
paths  of  the  graph.  Our  friends  oblige  us  as  follows:  We  may  start 
traversing  at  any  vertex  already  visited.  The  traversal  rule  is,  then, 
that  we  must  either  traverse  graph  edges  or  "jump"  to  a vertex  pointed  to 


i 

i 
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by  a friend.  The  time  required  to  traverse  a sequence  of  vertices  is  then 
simply  the  number  of  applications  of  traversal  rules.  Notice  that  the 
result  of  a traversal  is  not  necessarily  a path  of  G.  The  connection 
between  fingers  and  random  accessing  is  that  traversals  requiring  k-fingers 
also  require  k-"addresses"  for  the  vertices  pointed  to. 

We  then  say  that  G £ ^ G*  if  there  is  a one-one  $:V(G)  -*•  V(G*) , so 
that  for  every  x,y  e V(G)  with  d (x,y)  = m,  there  is  a k-finger  traversal 

U 

from  $(x)  = x*  to  $(y)  = y*  with  time  at  most  A,  and  A £ Td^*(x*,  y*) . 

We  have  the  following 

Theorem.  If  G is  the  n * n array  f71.  H is  a binary  tree 
n 


Gn  - k,T(n)  H * 


k + T(n)  > c log  n , 


where  c is  a constant  independent  of  n. 


OTHER  TYPES  OF  AVERAGE  EMBEDDING 


The  relation  £ a^8  may  be  thought  of  as  averaging  - with  relative 
frequencies  uniformly  distributed  to  the  edges  E(G)  - over  the  edges  of  G. 
We  now  make  a more  global  definition  which  may  be  used  to  recover  our 
intuitions  about  path  lengths  in  binary  trees  [7].  We  will  essentially 
average  our  shortest  paths: 

G < Pat^8  g*  if  there  is  an  embedding  $:V(G)  V(G*)  such  that 


2d  OKA,  Ky))  £ A*  2d  (x,y)  . 
<Kx),  <J>(y)  x,y 
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We  Chen  have  the  following 


Theorem. 


For  each  n >_  0, 


let  A 

n 


G < 
“ A 


be  the  least  real  number  such  that 
paths  H 


for  a binary  tree  H.  Then 


Thus,  we  see  that  if  the  average  embedding  is  required  to  work  well  on 

all  shortest  paths,  then  the  embedding  cost  goes  to  zero.  In  a sense, 

then  _<  avg  "charges"  more  heavily  than  £ Paths  for  any  bottlenecks. 

A A 
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